首页> 外文OA文献 >Integration of a web-based rating system with an oral proficiency interview test: argument-based approach to validation
【2h】

Integration of a web-based rating system with an oral proficiency interview test: argument-based approach to validation

机译:基于网络的评分系统与口语能力面试测试的集成:基于论证的验证方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This dissertation focuses on the validation of the Oral Proficiency Interview (OPI), a component of the Oral English Certification Test for international teaching assistants. The rating of oral responses was implemented through an innovative computer technology—a web-based rating system called Rater-Platform (R-Plat). The main purpose of the dissertation was to investigate the validity of interpretations and uses of the OPI scores derived from raters’ assessment of examinees’ performance during the web-based rating process. Following the argument-based validation approach (Kane, 2006), an interpretive argument for the OPI was constructed. The interpretive argument specifies a series of inferences, warrants for each inference, as well as underlying assumptions and specific types of backing necessary to support the assumptions. Of seven inferences—domain description, evaluation, generalization, extrapolation, explanation, utilization, and impact—this study focuses on two. Specifically, it aims to obtain validity evidence for three assumptions underlying the evaluation inference and for three assumptions underlying the generalization inference. The research questions addressed: (1) raters’ perceptions towards R-Plat in terms of clarity, effectiveness, satisfaction, and comfort level; (2) quality of raters’ diagnostic descriptor markings; (3) quality of raters’ comments; (4) quality of OPI scores; (5) quality of individual raters’ OPI ratings; (6) prompt difficulty; and (7) raters’ rating practices.A mixed-methods design was employed to collect and analyze qualitative and quantitative data. Qualitative data consisted of: (a) 14 raters’ responses to open-ended questions about their perceptions towards R-Plat, (b) 5 recordings of individual/focus group interviews on eliciting raters’ perceptions, and (c) 1,900 evaluative units extracted from raters’ comments about examinees’ speaking performance. Quantitative data included: (a) 14 raters’ responses to six-point scale statements about their perceptions, (b) 2,524 diagnostic descriptor markings of examinees’ speaking ability, (c) OPI scores for 279 examinees, (d) 803 individual raters’ ratings, (e) individual prompt ratings divided by each intended prompt level, given by each rater, and (f) individual raters’ ratings on the given prompts, grouped by test administration.The results showed that the assumptions for the evaluation inference were supported. Raters’ responses to questionnaire and individual/focus group interviews revealed positive attitudes towards R-Plat. Diagnostic descriptors and raters’ comments, analyzed by chi-square tests, indicated different speaking ability levels. OPI scores were distributed across different proficiency levels throughout different test administrations. For the generalization inference, both positive and negative evidence was obtained. MFRM analyses showed that OPI scores reliably separated examinees into different speaking ability levels. Observed prompt difficulty matched intended prompt levels, although several problematic prompts were identified. Finally, while the raters used rating scales consistently adequately within the same test administration, they were not consistent in their severity. Overall, the foundational parts for the validity argument were successfully established.The findings of this study allow for moving forward with the investigation of the subsequent inferences in order to construct a complete OPI validity argument. They also suggest important implications for argument-based validation research, for the study of raters and task variability, and for future applications of web-based rating systems for speaking assessment.
机译:本论文着重于口语能力面试(OPI)的验证,这是针对国际助教的英语口语能力测验的组成部分。口头反应的评分是通过创新的计算机技术实现的,该技术是基于网络的评分系统,称为Rater-Platform(R-Plat)。论文的主要目的是研究基于网络评估过程中评估者对考生表现的评估得出的OPI分数的解释和使用的有效性。遵循基于论证的验证方法(Kane,2006),为OPI构建了解释论证。解释性论据指定了一系列推论,每个推论的保证,基础假设以及支持这些假设所必需的特定类型的支持。在七个推论(领域描述,评估,概括,外推,解释,利用和影响)中,本研究着重于两个。具体而言,它旨在为评估推断基础的三个假设和泛化推断基础的三个假设获得有效性证据。研究问题涉及:(1)评估者对R-Plat的清晰度,有效性,满意度和舒适度方面的看法; (2)评估者诊断描述符标记的质量; (3)评估者评论的质量; (4)OPI分数的质量; (5)个别评估者的OPI评估质量; (6)提示困难; (7)评估者的评级实践。采用混合方法设计来收集和分析定性和定量数据。定性数据包括:(a)14个评估者对有关其对R-Plat的看法的开放式问题的答案;(b)5个关于引起评估者看法的个人/焦点小组访谈的记录,以及(c)提取1,900个评估单位评分者对考生演讲表现的评论。定量数据包括:(a)14位评分者对六点量表关于他们的感知的陈述的回应;(b)2,524位考生说话能力的诊断描述标记;(c)279位考生的OPI分数;(d)803位独立评分者等级,(e)每个评估者给出的每个即时评估等级除以每个预期的评估等级,以及(f)按测试管理进行分组的给定评估者的单个评估者等级。结果表明,支持评估推断的假设。评分者对问卷和个人/焦点小组访谈的回答显示出对R-Plat的积极态度。通过卡方检验分析的诊断描述符和评分者的注释表明了不同的口语水平。 OPI分数分布在不同考试管理中的不同熟练程度之间。对于泛化推断,获得了正面和负面证据。 MFRM分析表明,OPI分数可靠地将考生分为不同的口语水平。尽管发现了几个有问题的提示,但观察到的提示难度符合预期的提示级别。最后,虽然评分者在同一次考试中始终使用足够的评分量表,但其严重程度却不一致。总体而言,成功建立了有效性论证的基础部分。本研究的发现可以推动对后续推论的研究,以构建完整的OPI有效性论证。他们还为基于论证的验证研究,评估者和任务变异性的研究以及基于Web的评估系统在语音评估中的未来应用提供了重要的启示。

著录项

  • 作者

    Yang, Hye Jin;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号